Introduction

The Victorian age was considered to be the Golden Age for English literature by most critics. During this era, novels became the most common genre of literature. While novels gained popularity, the gender ratio of the authors behind them remained fairly stagnant, skewing the perspectives offered through these novels. In the absence of representation, women did take to writing in two ways. Some women approached writing as themselves, a woman offering her perspective. Other women wrote under the guise of being a man, offering a less masculine perspective on women. Both parties played a pivotal role in paving the way for modern literature. Comparing how their writing styles varied allows us to examine the implications of changing perspectives about perceived gender. Emotions play a huge role in defining perspectives, so comparing the emotional density between characters written by both types of writers would offer us an understanding of why women picked either party.

Research hypothesis

Our research hypothesis claims that as George Elliot wrote under a pen name, her female characters would be more stereotypically emotional than the female characters of Elizabeth Gaskell’s novels.

Corpus Description

Our hypothesis needed a corpus that involved two authors from the same time period - something we picked and integrated into the hypotheses. Mary Ann Evans was an English novelist and poet, writing during the Victorian era. She wrote 7 novels under the masculine pen name, George Elliot. She is widely known for her works under the pseudonym, which led us to consider how her works might have been in the absence of a masculine pen name. Another author in the same time period, Elizabeth Gaskell, offers the perfect comparison to analyze how being able to write from behind a masculine facade might impact the works created. Elizabeth Gaskell was an English novelist from the victorian era, who wrote on a timeline that we can emboss on Elliott’s. We’ve chosen 8 of her novels that best fit the timeframes. Eight novels from Gaskell is also a good way to ensure an equal comparison between both authors. From Gaskell, we’ve picked Mary Barton (1848), Cranford (1851–53), Ruth (1853), North and South (1854–55), My Lady Ludlow (1858), A Dark Night’s Work (1863), Sylvia’s Lovers (1863) and Wives and Daughters: An Everyday Story (1864–66). From Elliot, we picked Adam Bede (1859), The Mill on the Floss (1860), Silas Marner (1861), Romola (1863), Felix Holt, the Radical, (1866), Middlemarch (1871–72) and Daniel Deronda, (1876).

Summary Paragraph

The graphs have been created as discrete visualizations, as we were unable to represent the data as coherently when placed in a single graph. This corpus has 12 documents with 2,789,254 total words and 38,941 unique word forms. The top 4 longest texts and top 4 shortest texts are evenly distributed between both authors. We created NER visualizations for Eliot and Gaskell’s texts to represent the difference in how they both portray the emotional nature of their male and female characters.

Data visualisation 1

Before we begin to discuss the implications of the positive and negative characters, it’s useful to consider an outlier. In Middlemarch, George Elliot quotes Pilgrim’s Progress in chapter 85. However, the quote refers to characters named after negative emotions and feelings - Mr Malice, Mr Liar, Mr Enimity, Mr Cruelty etc. This skews the data that Middlemarch provides us for the most negatively viewed characters, thereby also negating our comparison between this and that of its positive characters. For this reason, we removed these characters to hopefully show us a more accurate representation of the negative characters. Multiple other texts have mildly skewed data due to references to other books; another notable example is Daniel Deronda having multiple negative references to Paradise, Pilgrim and Madonna Pia, based on the references the text makes to older texts like the Divine Comedy and Paradise Lost.

Eliot’s radar plots show variations in female versus male characters, with more emotionally ranging female characters and more emotionally nonchalant male characters, as one can see with the almost concentrated radar plots for Eliot’s men when compared to Gaskell’s men. While there are instances of this, there aren’t enough consistencies to support the same. For example, Mary has a wildly emotional radar plot, but Aunt Alice has a closed, centred radar plot. Some of the men have extremely emotional radar plots in Eliot’s novels. Most of Gaskell’s novels indeed have a higher range of emotion, either high or centred. Gaskell also has a larger variety of emotions, with some characters showing immense joy and fear while others show immense trust and anticipation.

On further introspection, we also notice that the radar plots for all of Gaskell’s novels show a wider emotional range overall, both positive and negative, than that of Eliot’s novels. In essence, both positive and negative emotions range up to 2500, tempering the conventional over-dramatic depiction of female emotion. She depicted the female characters with the same social-emotional value but modified that of the men to make the variations seem less stark in contrast.

Reflection upon hypothesis

Our hypothesis was partially proven. Eliot did write her female characters through a conventionally masculine lens, portraying them as emotional in comparison to their nonchalant and stoic male counterparts. However, she also depicted a few male characters as emotional, providing a less skewed representation of the female population by showcasing a more accurate depiction of the male population. Gaskell didn’t have as unskewed a representation of the women in her novels as expected, but she did portray them in a kinder light. She too, like Elliot, skewed that depiction to ensure that the ment too were portrayed in just as emotional a manner. Neither of them made their women vastly different than the conventionally emotional women of the time. Granted, Gaskell did portray them in a fairer depiction. However, what both authors did is increase the depiction of the emotions of the men in these tales, to unskew the “over-emotional female” trope.

General reflection

Working to analyze texts through R was vastly different from working through Voyant. Working remotely made Voyant an easier and more uniform tool when it came to issues and errors. With R, both of us faced varying errors with the same code. From errors in reading files, accessing online copies of the file to creating discrete visualizations, both of us faced a variety of errors. Once we got the visualizations, this offered us immense control over the intricacies of the data. We could manipulate and omit certain repetitive elements, account for character names that might influence the data and account for how emotions play out in a sentence or with reference to a character. While cleaning up the data was an immensely time-consuming aspect of it, something we did not deal with in Voyant, it did make it easier for us to obtain more accurate and specific visualizations. However, one must have a general knowledge of the books and main characters to be able to accurately weed out the erroneous data entries by the NER tool.

We could analyze sentiment and net sentiment, both around characters and around lines. This allowed us to create visualizations around the positive and negative emotions around certain characters, as well as to observe the net value of the same. This wasn’t available in Voyant. While correlations and collocates allow us to gain a certain insight into a corpus, it isn’t the best way to understand how characters are crafted. R also allows us to isolate total emotion for a particular character. It also adjusts these values for the frequency of characters’ appearance in a text.

##Coding tingz

We first unnested the sentences, and got a view of the words that “surround” the entities. Then we created a separate table of sentiments by sentence, following which we recombined the columns of sentiments with the original entities. We did this for both authors.

entities_matches_sentiment_eliot <- read.csv("entities_matches_sentiment_eliot.csv")
entities_matches_sentiment_gaskell <- read.csv("entities_matches_sentiment_gaskell.csv")
ner_total_sentiment_eliot <- read.csv("ner_total_sentiment_eliot.csv")
ner_total_sentiment_gaskell <- read.csv("ner_total_sentiment_gaskell.csv")
eliot_middlemarch_cleaner <- read.csv("eliot_middlemarch_cleaner.csv")
entities_matches_sentiment_middlemarch_cleaner <- read.csv("entities_matches_sentiment_middlemarch_cleaner.csv")

We then went on to sort the final data by book to depict the visualisations

Visualization of top 10 Characters (Positive)

Here we have the top 10 characters based on their positive sentiment from both authors. Note that the characters may or may not be positive. This is merely describing their context.

Eliot’s corpus

Gaskell’s Corpus

Visualisations of top 10 characters (Negative)

Below are the top 10 characters based on their negative sentiment from both authors. Again, the characters may or may not inherently be negative, it simply describes their context.

Eliot’s corpus

We can see that the graph in Middlemarch is skewed because of the naming of characters as mentioned in the data visualisation section above. Below we see the graph with those characters removed.

Gaskell’s Corpus

Chart of top characters by appearance

A graph that’s more useful to us is figuring out the sentiments of all the major characters. This is done by sorting by appearance.

Eliot’s Corpus

Gaskell’s Corpus

A look at “total emotion”

These radar plots essentially depict the most emotional characters from each novel. These characters are determined by how many different emotions are tagged to them.

Eliot’s Corpus

Gaskell’s Corpus

Radar plot by positive emotion

These radar plots depict characters from the novels that appear the most positive.

Eliot’s Corpus

Gaskell’s corpus

Radar plot by negative emotion

These radar plots depict characters from the novels that appear the most negative.

Eliot’s Corpus

Here again the graphs are skewed for Middlemarch as a result of the character names themselves, so here’s a representation with those names removed.

Gaskell’s Corpus